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Abstract. In this contribution, we consider the problem of blind source separation in a Bayesian 
estimation framework. The wavelet representation allows us to assign an adequate prior distribution 
to the wavelet coefficients of the sources. MCMC algorithms are implemented to test the validity of 



the proposed approach, and the non linear approximation of the wavelet transform is exploited to 
\ aleviate the algorithm. 

INTRODUCTION 

>Y 

We find applications of blind source separation (BSS) in many fields of data analysis: 
Qh| chemistry, medical imaging (EEG, MEG), seismic data analysis and astronomical imag- 

ing. Many solutions have been developped to try to solve this problem: Independant 
Component Analysis (ICA) [3, 6], maximum likelihood estimation [5], and methods 
based on second or higher order statistics of the signals [1, 2]. These methods have 
proved their efficiency in many applications, however they do not apply for noisy obser- 
vations models. 

A different approach has been considered to solve the BSS problem, we find in 
[13, 11, 8] an introductory analysis of the problem in a Bayesian estimation framework. 
Some of the methods outlined earlier can be reformulated via the Bayes rule, and a 
similar formalism can be obtained. 

In this contribution, we treat the BSS problem in a Bayesian estimation framework. 
As in previous works on this subject [10, 7], the problem is transported to a transform 
domain: the wavelet domain. The advantage of such an approach is that some invertible 
transforms restructure the data, leaving them structures simpler to model, and this, as 
will be seen later, is useful in the formulation of the problem as an inference problem. 
^ The paper is organized as follows: In section-II we present the BSS problem, write the 

associated equations and introduce the Bayesian solution of the problem. In section-Ill, 
we transport the problem to a transformed data space (wavelet) and give the justification 
for that approach. In section-IV, we present the associated MCMC-based optimization 
algorithm. We consider then the non-linear approximation of the wavelet transform to 
introduce a denoising procedure by some thresholding rule. At the end, we conclude and 
give future perspectives of the present work. 



BAYESIAN BLIND SOURCE SEPARATION (BBSS) 



Blind source separation (BSS) consists of recovering unobservable sources from a set of 
their instantaneous and linear mixtures. The direct observational model can be described 
by: 

x(t) = As(t) + e(t), teC (1) 

where C = {Z : time series signals, 1? : 2D images}, x(t) t=1: ,„ jT is the observed m- 
column vector, s(£) t=1> .. . jT is the unknown sources n-column vector, A is the (m x n) 
full rank matrix, and e(t) t =i ,r is the noise m-column vector. 

The Bayesian approach to BSS starts by writing the posterior distribution of the 
sources, jointly with the mixing matrix and any other parameters needed to describe 
the problem: 

P (s, A, R e | x) oc P (x | s, A, R e ) n(s,A, R e ) (2) 

where P (s,A,R £ \x) is the joint posterior distribution of the unknown sources, the 
mixing matrix and the noise covariance matrix. P (x\s,A,R e ) is the likelihood of the 
observed data x and ir(s, A, R e ) is the prior distribution that should reflect the prior 
information we may have about s, A and R e . The noise e(t) is assumed Gaussian, 
spatially independant and temporarily white: Af(0,R e ), with R e = diag (of , o^J. 

An important step in Bayesian inference problems is to assign appropriate expressions 
to 7r (s, A, R e ). The likelihood P [x |s, A, R t ) is determined by the hypotheses made on 
the noise e(t). It is reasonable to assume that the sources, the mixing matrix and the noise 
variance are independant: 

7t(s,A,R £ ) =7l(s)7l(A)7l(R e ) (3) 

The prior distribution n (A) can be determined by some physical knowledge of the mix- 
ing mechanism. In our work, the mixing matrix is assigned a Gaussian prior distribution: 

n(A) = Y[tf(a ij \iJi ij ,o? j ) (4) 

The appropriate selection of prior distributions is still a subject of intensive research. We 
find in [16, 12] some interesting work on this topic. We thus define, as results of these 
works, a Gamma prior distribution for the inverse of the variances: 

n(x) = g(x\a,9)(xx a - 1 e-v (5) 

Some work has been done on BSS [17, 15], by assigning a mixture of Gaussians prior 
to the sources: 

L L 

n ( s ) = ^2piNi (s\m,n) , ^2pi = 1 ( 6 ) 
i=i i=i 

This distribution is very interesting, any distribution n* (s) can be well aproximated by 
a Gaussian mixture distribution, and the higher L (number of Gaussians), the better the 
approximation, but the higher the complexity of the associated model. The difficulty lies 
then in how L should be chosen to well approximate the distribution with reasonable 
complexity ?. We note that for L Gaussians, we need (3L — 1) parameters (p,/x,r) to 
totally define the mixture. 



BBSS IN THE WAVELET DOMAIN 



An idea, that has been exploited with success, is to treat the problem in a tranform 
domain. We find in [14] a proposed solution to a spectral BSS problem. In [10, 7], a 
first approach to the problem has been treated in the wavelet domain. The particular 
properties of these transforms: linearity and inversibility makes that the BSS problem is 
formulated in a similar manner and that we can go back and forth without any difficulty. 
The BSS problem described by equation (1) is rewritten in the wavelet domain as: 

wi(k)=Awi(k)+w>(k) keC,j = l,...,J (7) 

where C = {Z : time series signals, 1? : 2D images}, and : 

wi(k)= <s(t),tp j (t-k)> = J s{t)i) j {t-k)dt (8) 

where ip j (t) = 2~ j / 2 ip (2~H). We point out to the fact that the statistical properties of the 
noise does not change in the wavelet doamin: 

e(t) ~jV(0,af) =>■ w{{k) ~A/"(0,of) (9) 

We will refer by w 3 s (k),w{(k) and w{(k) to the wavelet coefficients vectors of s(t),x(t) 
and e(t) at resolution j, respectively. The /c-index will be dropped to aleviate the 
expressions since w J s and w{ are temporarily white, and thus w J s (k) and w J s define 
identically the same vector unless specified. 
The posterior distribution of the new unknowns is now given by: 

P (wi,A, R e | w{) oc P {w{\wi,A, R e ) n (w*) n (A) vr (R £ ) (10) 

The wavelet transform has some particular properties that make it interesting for 
Bayesian formulation of the BSS problem: 

locality each wavelet atom ipj (t — k) is localised in time and frequency, 
edge detection a wavelet coefficient is significant if and only if an irregularity is present 
within the support of the wavelet atom. 

These two properties have a great impact on the wavelet (lD/2D)-statistical signal 
processing. The wavelet coefficients can be reasonably considered uncorrelated due to 
locality (we say that the wavelet transform acts as decorrelator), and assigned a separable 
probabilty distribution: 

7r{U^)}=n^KW) < U ) 

j,k j,k 

The second property (edge detection) has a consequence on the type of the distribution 
we will assign to the wavelet coefficients: 

The wavelet transform of natural sources results in a large number of small 
coefficients, and a small number of large coefficients. 



This property (sparsity) is shown in Figure (1). The prior distribution of the wavelet 
coefficients is then very well approximated by centered, peaky and heavy tailed like 
distributions. Mallat has porposed in [9] to model the wavelet coefficients by generalized 
exponential distributions: 

P{.) = K£xp(^~\.\ aS J , 7 >0,l<a<2 (12) 

Crouse in [4] has assigned to the wavelet coefficients a Gaussian mixture distribution to 
capture the sparsity characteristic: 

P(.)= P M(.\0,T L ) + (1 -p) M (.\0,T H ) , T H » T L (13) 

where p = Prob.(wavelet coefficient G low energy state). In the sequel, we will only 
emphasize on the Gaussian mixture model. For the generalized exponential case, we 
refer to [7]. Note that we choose a two Gaussian mixture model with a total number of 
parameters equals to three. 



MCMC IMPLEMENTATION 

Once we have defined the priors and properly written the posterior distribution 
P (w 3 s , A,R e \wQ, we define a posterior estimates of the different parameters that 
characterizes the BSS problem. To do this, we will generate samples from the joint 
distribution (10), by means of MCMC algorithms (Monte Carlo Markov Chain methods) 
and than choose the posterior means as estimates. 



Hidden variables 

The conditional posterior distribution of the sources coefficients is a mixture of 
Gaussians of the type: 

P(wi\w{,A,R e ) cxAf(wi\Awi,R e )ir(wi\0) (14) 

where 

n n L 

* h\o) = ]> (<w = nz^wi ' 7 *) 

i i I 

where i stands for the i-th source. The complexity of such model increases with in- 
creasing n (for a 2-Gaussians wavelet model, a total of (2L — l) n = 3 n parameters has 
to be defined in order to describe the model). Thus the introduction of a label variable 
z° E {1, . . . , L} n = {1, 2} n = {Low state, High state}™ and a conditional parametrisation 
of the form: 

7r(wi\e,z j G [L,H]) =Af(wi\0,T [L , H] ) (15) 



with P(z j e L) = p L , and P(z j G H) = p H = 1 -p L . 



The MCMC Algorithm 

The hidden variables 



1. z 3 



P{z>\wi,0) = j P(zi,wi\wi,0) 

Jw s 

= rr(z j ) j Af(wi\Awi,R e )7r(wi\z j ,R T ) 

where it (w j s | z j , R T ) = Af (w 3 s 1 0, _R r ) , and R T = diag (ti , .. ., r n ) . 
The sources wavelet coefficients 

2. w j s ~ P{wi\w{,z j ,e) = M(wi\Awi,R € )Af(wl\0,R T ) 

= N(w j s \n s , z ,Rs/ z ) 

where n s/z = Rs^R^A^wi, and R s/z = (A^R^A + R; 1 ) ~\ 
The mixing matrix 

3. A~p(A\w{,0) = N(wi\Awi,R e )N(A\fi a ,R a ) 

= N(A\n A) R A ) 

where vec 1 (fji A ) = R A ( (-R7 1 ® ^) wee (C M ) + Ma) . #a = (R7 1 ® C ss + #a X )> 

£j, fc «M 1 and c^^E^K^- 

The hyperparameters 



4. ~ P(G\wi,w 3 s ,A) =P(w{\wi, A, 6)71(6) 

where stands for the the noise covariance matrix R e and the mixture parameters 
R T = diag (ri, r n ) (variances of the Gaussians in the mixture). 
The noise covariance 

4.g. 4~P(4|<X^) = A/" j [Awi]i, cr^)lQ (o\. j 2, l) 

= JC? (ofja^j) , i = l,...,m 

where a = T/2 + 2 and 1/0* = ( £ fc « - [A^];) 2 /2 + l) . 
The Gaussians variances 

4.6. ij[L,H] ~ P(r/|<) = Ar«|0,7f)X^(r/|2,l) 

= J^(7f|^,^), t = l,..,n 

where = + 2 and 1/0? = «.I ( ^ =0 ) 2 /2 + l), Z = {L,H}. 



vec(.) is the vector representation of a matrix. 



The prior probabilities 

5 - IpIl^pIh} ~ p {PiL,PiH\Q) = ^2(ui + n iL ,u 2 + n iH ), i = l,...,n 

where nu = J2k^-( z j =iy an( ^ ^2 (71,72) stands for the Dirichlet distribution with param- 
eters (71,72) for the probability variables (pl,Ph = 1 ~Pl)- 



SIMULATION RESULTS 



To verify the plausibilty of the proposed algorithm, we have made some tests on simu- 
lated data (128 x 128 pixels). In figure 2. a, we present an aerial image and a cloud image 
that were linearily mixed to obtain the observed data in figure 2.b. The mixing matrix is 
of the form: 

" .91 .49 
.42 .87 



A 



The signal to noise ratio is of 20dB. The Symmlet-4 wavelet basis has been chosen (with 
4 vanishing moments). The obtained estimates of the sources are presented in figure 2.c. 
The evolution of the estimates of the elements of the matrix is presented in figure 3, 
where the empirical posterior mean is found to be: 

.92 .51 
.39 .86 

To quantify the estimates of the sources, we choose a distance that is invariant under 
a scale transformation (since the sources are estimated up to a scale factor): 



<J( Sl (f),s 2 (t)) = l- 



< s 1 (t),s 2 (t) > 

II Sill -IN II 



(16) 



where < .,. > and |.| stand for the scalar product and the L 2 norm respectively. 5 is 
positif and upper bounded by 1. 

In order to quantify the estimates of the mixing matrix, we measure the observation 
distance defined by: 



1 



5 A = —^25(xi(t),Xi(t)) 



(17) 



where x(t) = As(t) and x(t) = As(t). In the simulated example, 5a = 2.28 x 10" 



NON LINEAR MCMC IMPLEMENTATION 

The implementation of the proposed MCMC algorithm is modified by making use of the 
non linear approximation of the wavelet transfrorm: 

/mH = Yl < f^j> k > ^A n ] ( lg ) 

{j,k}ei M 



where 1m corresponds to the largest coefficients, and Jm\p\ is the non linear approxi- 
mation of f[n] by the M largest coefficients. It is implemented by applying some non 
linear function to the wavelet coefficients of the form: 



T(</,^,*>) = { J 



<f,ipj,k> for | <f,if>j,k> I >X 
elsewhere 



known as hard thresholding. We define equivalently the soft thresholding by: 



T(</,^,*>) = { J 



<f,ipj,k>~X for |</,^j,k>|>X 
elsewhere 



In step 1 of the MCMC algorithm, the hidden variable z j is sampled from the 
posterior probability P(z j \.). The non linear approximation procedure consists then 
of sampling only the coefficients that are large (in a high energy state), that corresponds 
to z j e H: 

i. z j ~ p(z j \wi,e) = f p(z j ,wi\wi,e) 

= tt(z j ) / P(wi\Aw J s ,R e )7r(wi\z j ,R T ) 

Jw s 

= [ post l, post H ] n ^z J e{L,H} n 
the sampling of the sources coefficients with a thresholding procedure is then: 

wi\(z j = L) = 

wi\(z j = H) = M(wi\0,r H ) 



2. 



We point out to the fact that we do not have to specify the threshold x, which is a hard 
task by itself, it is automatically set by the classification of the coefficients into Low 
energy coefficients and High energy coefficients. This additional procedure allows to 
have estimates free from any residual noise, as will be seen in the simulations, and the 
whole algorithm could be described as separation/denoising algorithm. 

The non linear step has been applied to the same data set, the estimation results are 
presented in figure 4 and the estimated mixing matrix is: 



A = 



.89 .51 
.46 .86 



and the observation distance 5a = 9.16 x 10~ 4 . 

The algorithm has been tested on ID signals and the results presented in figure 6 show 
the effect of the non linear MCMC implementation on denoising the estimates. In figure 
7, a second example is presented where the additional information brought by this non 
linear procedure is very apparent in the sense that it helps for separating the sources in a 
very noisy environment. 



SUMMARY AND PERSPECTIVES 



In this work we presented a Bayesian aproach to blind source separation in the wavelet 
domain. The main interest to try to solve the problem in the wavelet domain is to be 
able to use a simpler probabilistic model for the sources i.e. a two component Gaussian 
mixture model with a total of three parameters as opposed to a 3L — 1 parametric model 
in the direct model with L undetermined. Indeed, the interpretation of the mixture model 
as a heirarchical hidden variable model gives us the ability to apply some automatic 
thresholding rule to the wavelet coefficients. Finally, we showed some performances of 
the proposed method on simulated data. 

Concerning our perspectives, we follow essentially these directions: 

i) a quad tree Markovian modeling of the wavelet coefficients to account for inter- 
scale correlation. 

ii) an adaptative basis selection criteria to improve the thresholding procedure. 
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FIGURE 1. Sparsity property of the wavelet coefficients: a. aerial image, b. histogramme of image (a), 
c. the wavelet transform of image (a), d. histograms of the wavelet coefficients in the different bands (c) 
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